Extracting tree fragments in linear average time

نویسنده

  • Andreas van Cranenburgh
چکیده

This report details the implementation of a fragment extraction algorithm using an average case linear time tree kernel. Given a treebank, the algorithm extracts all fragments that occur at least twice, along with their frequency. Evaluation shows a -fold speedup over a quadratic fragment extraction implementation. Additionally, we add support for trees with discontinuous constituents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A method for analyzing the problem of determining the maximum common fragments of temporal directed tree, that do not change with time

In this study two actual types of problems are considered and solved: 1) determining the maximum common connected fragment of the T-tree (T-directed tree) which does not change with time; 2) determining all maximum common connected fragments of the T-tree (T-directed tree) which do not change with time. The choice of the primary study of temporal directed trees and trees is justified by the wid...

متن کامل

Discontinuous Parsing with an Efficient and Accurate DOP Model

We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We g...

متن کامل

Извлечение низкочастотных терминов из специализированных текстов (Extraction of Low-Frequent Terms from Domain-Specific Texts)

We examined a method for extracting the low frequency important single-word terms from domain specific text. Firstly, domain-relevant fragments were extracted from the text with the help of a dependency tree. Then the fragments were clustered and candidate terms were defined using the semantic classifier. The studies suggest that this approach allows extracting even terms with a single occurrence.

متن کامل

LZ77 Factorisation of Trees

We generalise the fundamental concept of LZ77 factorisation from strings to trees. A tree is represented as a collection of edge-disjoint fragments that either consist of one node or has already occurred earlier (in the BFS order). Similarly as for strings, such a collection uniquely determines the tree, so by minimising the number of fragments we obtain a compressed representation of the tree....

متن کامل

An improved algorithm to reconstruct a binary tree from its inorder and postorder traversals

It is well-known that, given inorder traversal along with one of the preorder or postorder traversals of a binary tree, the tree can be determined uniquely. Several algorithms have been proposed to reconstruct a binary tree from its inorder and preorder traversals. There is one study to reconstruct a binary tree from its inorder and postorder traversals, and this algorithm takes running time of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012